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In 1990, the National Assessment of Educational Progress 
(NAEP) included a Trial State Assessment which, for the first time in the 
NAEP ' s history, made voluntary state-by-state assessments. This 1992 
mathematics report marks the first attempt of the National Center for 
Education Statistics to shift to standards-based reporting of National 
Assessment statistics. NAEP results are reported by achievement levels which 
are descriptions of how students should perform relative to a body of content 
reflected in the NAEP frameworks; in other words, how much students should 
know. The 1992 assessment covered six mathematics content areas: (1) numbers 

and operations; (2) measurement; (3) geometry; (4) data analysis, statistics, 
and probability; (5) algebra and functions; and (6) estimation. In 
Pennsylvania, 2,740 fourth-grade students in 111 public schools and 2,640 
eighth-grade students in 99 public schools were assessed. This report 
describes the mathematics performance of Pennsylvania fourth- and 
eighth-grade students in public schools and compares their overall 
performance to students in the Northeast region of the United States and the 
nation. The distribution of the results are provided for subpopulations of 
students including race/ethnicity; type of 

community- -advantaged/ disadvantaged urban, extreme rural, and other; parents 
education level; gender; and content area performance. To provide a context 
for understanding students' mathematics proficiency, students, their 
mathematics teachers, and principals completed questionnaires which focused 
on: what are students taught? (curriculum coverage, homework, and 
instructional emphasis) ; how is mathematics instruction delivered? 
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(resources, collaborating in small groups, using mathematical objects, and 
materials) ; how are calculators and computers used? (access and use of 
calculators, availability of computers, and when to use a calculator) ; who is 
teaching mathematics? (educational background) ; and conditions beyond school 
that facilitate mathematics learning and teaching (amount of reading 
materials in the home, hours of television watched per day, student 
absenteeism, and students' perceptions of mathematics) . The average 
proficiency of fourth-grade students in Pennsylvania on the NAEP mathematics 
scale was 223 compared to 217 nationwide; for Pennsylvania eighth-grade 
students the average proficiency was 271 compared to 266 nationwide. (ASK) 
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The National Assessment of Educational Progress (NAEP) is a Congressionally mandated project of the 
National Center for Education Statistics (NCES) that has collected and reported information for nearly 25 
years on what American students know and what they can do. It is the nation’s only ongoing, comparable, 
and representative assessment of student achievement. Its tests are given to scientific samples of youths 
attending both public and private schools and enrolled in grades four, eight, or twelve. The test items are 
written around a framework prepared for each content area — reading, writing, mathematics, science, and 
others — that represents the consensus of groups of curriculum experts, educators, members of the general 
public, and user groups on what should be covered on such a test. Reporting includes means and 
distributions of scores, as well as more descriptive information about the meaning of different points on the 
NAEP scale. 



A Recent History of NAEP Reporting 



Over time there have been many changes in emphasis of NAEP testing and reporting both to take advantage 
of new technologies and to reflect changing trends in education. In 1984, a new technology called Item 
Response Theory (IRT) made it possible to create “scale scores” for NAEP similar to those the public was 
accustomed to seeing for the annual Scholastic Aptitude Tests (SAT). Educational Testing Service, in its 
role as Government grantee carrying out NAEP operations, devised a new way to describe performance 
against this scale, called “anchor levels.” Starting in 1984, NAEP results were reported by “anchor levels.” 
Anchor levels describe distributions of performance at selected points along the NAEP scale (i.e., standard 
deviation units). Anchor levels show how groups of students perform relative to each other, but not 
whether this performance is adequate. 

In 1988, Congress authorized a new aspect of NAEP that allowed states and territories to participate 
voluntarily in a trial state assessment, using samples representative of their own students, to provide 
state-level data comparable to the nation and each of the other participating jurisdictions. Pursuant to that 
law, in 1990, the mathematics achievement of eighth graders was assessed in 40 jurisdictions (states, 
territories, and the District of Columbia). The results were reported in The State of Mathematics 
Achievement : NAEP’s 1990 Assessment of the Nation and the Trial Assessment of the States (Washington, 
DC: National Center for Education Statistics, 1991). 
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In the same 1988 law, Congress established the National Assessment Governing Board (NAGB), assigning 
it broad policy making authority over NAEP, including the authority to take “appropriate actions ... to 
improve the form and use of the National Assessment” and to identify “appropriate achievement goals for 
each . . . grade and subject area to be tested in the National Assessment.” To carry out its responsibilities, 
NAGB developed achievement levels, which are collective judgments about how students should perform, 
translated into ranges along the NAEP scale. The process was conducted for NAGB under contract by 
American College Testing (ACT), which has extensive experience in standard-setting in many fields. The 
standards setting process began with questions such as, “What should students know and be able to do if 
they are proficient in mathematics in the fourth, eighth, or twelfth grade?” The National Assessment 
Governing Board, after wide consultation including public hearings, developed statements to describe what 
students should know and be able to do at three levels of proficiency -- “Basic,” “Proficient,” and 
“Advanced” -- for each of the three NAEP grades. A panel of expert and broadly representative judges 
evaluated each NAEP item, judged the proportion of students at each level which should answer the items 
correctly, and made recommendations that resulted in points along the NAEP scale that corresponded with 
the minimum score for each of these levels. 

In 1990, after Congress had mandated pilot testing at the State level to supplement what had only been 
conducted for the Nation and four large regions, the more rigorous content of the mathematics standards 
prepared by the National Council of Teachers of Mathematics began to influence the NAEP frameworks. 

Also in 1990, the President and the nations’s 50 governors adopted six National Education Goals, including 
one that calls for American students to “leave grades 4, 8, and 12 having demonstrated competency in 
challenging subject matter, including English, mathematics, science, history, and geography.” The adoption 
of this goal highlighted a perceived deficiency in the Nation’s ability to report on the performance of 
students relative to standards developed through a consensus process. 



A Transition Phase in Reporting 



This 1992 mathematics report marks NCES’s first attempt to shift to standards-based reporting of National 
Assessment statistics. The transition is being made now to report NAEP results by “achievement levels.” 
Achievement levels describe how students should perform relative to a body of content reflected in the 
NAEP frameworks (i.e., how much students should know). The impetus for this shift lies in the belief that 
NAEP data will take on more meaning for the public if they show what proportion of our youth are able 
to meet standards of performance necessary for a changing world. Chapter 1 of the report describes how 
the 1992 standards were prepared and provides examples of test exercises that illustrate the mathematics 
content reflected in the descriptions of the NAEP achievement levels. 
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Reporting NAEP results on the basis of achievement levels represents a significant change in practice for 
NCES. On occasion, this agency makes use of emerging analytical approaches that permit new, and 
sometimes controversial, analyses to be done. Just as other statistical agencies do when introducing new 
measures to supplement or replace old measures, NCES has in this report provided the data according to 
the earlier procedures in addition to the new procedures. For this reason, in addition to NAEP results 
reported according to achievement levels, results according to the scale anchoring procedure that has been 
used since the 1984 assessment can be found in an appendix to this report. Presenting the data both ways 
gives the public — not just technical evaluators — an opportunity to be informed, so that all data users will 
be able to assess for themselves how well the various forms of reporting and interpreting the data meet their 
needs. 



Technical Review of NCES Reports 

All reports published by NCES are evaluated through an adjudication procedure. This process represents 
a final quality control check designed to assure that all publications conform to statistical standards, are 
grounded in the data, and take into account relevant substantive research literature. The adjudication 
process also attempts to delete misleading interpretive statements, and provide text that is clear and 
understandable to the American public. During the adjudication of this report neither the process for setting 
achievement levels developed by ACT nor the scores representing each level was addressed. The process 
and the cutpoints were taken as a given. The issue of valid inferences was addressed however. A number 
of reviewers interpreted statements about what students should do at the various achievement levels 
according to the standards set by NAGB as statements about what students can do. Independent studies 
are being conducted concerning the appropriate inferences that can be drawn from the NAEP results 
reported by achievement levels. Early results from technical evaluations suggested that this apparently 
logical step in interpretation might not be justified after closer examination of the data about what students 
at these levels actually demonstrate in terms of mathematical competencies. Discussion about the 
achievement levels also raised questions about the need for validity evidence for the anchor levels, as well 
as for greater understanding of the underlying assumptions of the process by which they were 
developed. 1 

This issue led NCES to seek the advice of several technical committees and to convene a meeting of 
technical and policy experts. Members, staff, and contractors of the National Assessment Governing Board 
participated in this meeting. Altogether these activities provided a forum for discussion of various historical 
and proposed approaches to interpreting the NAEP scale. In order to better inform the public about these 
and other interpretation issues, a companion NCES report entitled Interpreting NAEP Scales (Washington, 
DC: National Center for Education Statistics, 1993) explains several approaches to reporting information 
from NAEP. 



1 R.A. Forsyth. “Do NAEP Scales Yield Valid Criterion-referenced Interpretations?” Education Measurement: Issues and Practice, 
10. (1991). pp. 3-9, 16. 
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Then the next question is: Through their performance on the NAEP items, what actual knowledge and 
abilities did students demonstrate? Chapters 1 - 7 of this report include information on overall means and 
on distributions of scores, all taken directly from the test item data. The Appendix addresses this question 
in the manner that NAEP has used since 1985, using anchor points. As implemented for this report, the 
scale anchoring process provides a concise summary of what students know and can do at various points 
along the scale that differentiates them from students performing at lower levels. First, students performing 
at or around four intervals on the scale were identified (200, 250, 300, and 350 — each of which is one 
standard deviation unit apart). Next, questions were identified that were answered correctly by 65 percent 
or more of the students at one level and by fewer than half of the students at the next lower level. Finally, 
mathematics educators were asked to analyze each anchor-level question and create summary descriptions 
of the knowledge and skills evidenced by students who answered these sets of questions successfully. The 
critical distinction here is that anchor levels attempt to describe what students can do at and around selected 
points on the NAEP scale; achievement levels attempt to describe what students should be able to do in 
various ranges of the NAEP scale. 

Future Work 

These achievement level standards are in the second round (the first being in 1990) in a developmental 
process which has been revised and is still under review through several studies. 2 The Board’s goal is to 
provide a statement of what American students should be able to do as a standard that can give more 
meaning to the NAEP data. They then want to use the NAEP data to inform the nation as to how many 
students actually can meet these standards. 

NCES realizes that modifications and improvements may be necessary in the future as current procedures 
are evaluated and new approaches are considered. NCES conceives of this process as a research and 
developmental activity in which numerous statistical, psychometric, and substantive issues must be resolved. 
At the present time the effort is hampered by the problem of trying to create standards on a given framework 
and item pool developed for another purpose. In the future the measurement of standards will be a more 
prominent influence on the development of NAEP procedures. 



2 Assessing Student Achievement In the States. The First Report of the National Academy of Education Panel on the Evaluation 
of the NAEP Trial State Assessment: 1990 Trial State Assessment. (Stanford, CA: National Academy of Education, 1992).; 
R.L. Linn, D.M. Koretz, E.L. Baker, and L. Burstein. The Validity and Credibility of the Achievement Levels for the 1990 National 
Assessment of Educational Progress in Mathematics , Technical Report CSE No. 330. (Los Angeles, CA: Center for Research on 
Evaluation, Standards, and Student Testing, UCLA, June, 1991). 
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The goal of the National Center for Education Statistics is to make data available for the public and to do 
so in accurate and understandable ways that are not misleading. In this case, much of what matters in 
NAEP is changing: 

• the content in response to the developing standards of various curricular groups; 

• the test items in response to new developments in assessments; and 

• the reporting in response to, and increasing interest in, student achievement relative to 
standards of student performance. 

We believe that the numerous completed and ongoing studies will lead to national debate that will assure 
the public is well informed about these issues -- as informed they must be because the results will be a vital 
influence on what Americans come to think about the condition and progress of our schools. 

In addition, the public needs the data in this report to see for themselves what standards-based reporting 
migh t do and to evaluate the often conflicting claims of adherents and detractors of these changes in 
approaches to reporting on the educational achievement of American students. The Center eventually 
wants to use the achievement levels to describe what students know and can do. In order to accomplish 
that, the frameworks, tests, and achievement levels may need to be developed in tandem. That is easier to 
say than to do, however, because it implies a substantially larger pool of test exercises, carefully designed 
to support reporting about performance relative to a set of performance standards. Clearly this is a 
developmental effort that will take time and several iterations, during which data supporting appropriate 
inferences about the performance of American students will continue to be gathered. 
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In 1988, Congress passed new legislation for the National Assessment of Educational Progress (NAEP) that 
continued its primary mission of providing dependable and comprehensive information about educational 
progress in the United States. In addition, for the first time in the project s history, the legislation also 
included a provision authorizing voluntary, state-by-state assessments on a trial basis. 

As a result of the legislation, the 1990 NAEP program included a Trial State Assessment Program that 
assessed public-school students in 37 states, the District of Columbia, and two territories in eighth-grade 
mathematics. 3 The 1992 NAEP program included an expanded Trial State Assessment Program in fourth- 
and eighth-grade mathematics and fourth-grade reading, with public-school students assessed in 41 states, 
the District of Columbia, and two territories. In addition, national assessments in mathematics, reading, 
writing, and science were conducted concurrently with the Trial State Assessment Program in 1990 and in 
1992. 

In Pennsylvania in 1992, 1 1 1 public schools participated in the fourth-grade mathematics assessment, and 
99 participated in the eighth-grade mathematics assessment. The weighted school participation rate was 

95 percent in fourth grade and 94 percent in eighth grade, which means that the fourth-grade students in 
this sample of schools were representative of 95 percent of all the fourth-grade public-school students in 
Pennsylvania, and the eighth-grade students in this sample of schools were representative of 94 percent of 
all the eighth-grade public-school students in Pennsylvania. 

In total, 2,740 fourth-grade and 2,640 eighth-grade Pennsylvania public-school students were assessed in 
mathematics. The weighted student participation rate was 96 percent in grade 4 and 94 percent in 
grade 8. This means that the sample of students who took part in the assessment was representative of 

96 percent and 94 percent of the eligible fourth-grade and eighth-grade public-school student populations 
in participating schools in Pennsylvania (that is, all students minus those excluded from the assessment). 
The overall weighted response rate (school rate times student rate) was 91 percent in fourth grade and 

89 percent in eighth grade. This means that the sample of students who participated in the assessment was 
representative of 91 percent and 89 percent of the eligible fourth- and eighth-grade public-school student 
populations in Pennsylvania, respectively. 

3 For a summary of the 1990 program, see Ina V.S. Mullis, John A. Dossey, Eugene H. Owen, and Gary W. Phillips. The State 
of Mathematics Achievement: NAEP's 1990 Assessment of the Nation and the Trial Assessment of the States. (Washington, DC: 
National Center for Education Statistics, 1991). 
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Students’ performance in mathematics was summarized on the NAEP mathematics scale, which ranges 
from 0 to 500. 



Grade 4 The average proficiency of public-school students from Pennsylvania on the NAEP 
1**2 mathematics scale was 223. This proficiency was higher than that of students across the 
nation (21 7). 4 The lowest performing 10 percent of the students from Pennsylvania had 
proficiencies below 181 while the top 10 percent of the students had proficiencies above 
262. 

Grade 8 ] The average proficiency of public-school students from Pennsylvania on the NAEP 

1**2 I mathematics scale was 271. This proficiency was higher than that of students across the 

nation (266). The lowest performing 10 percent of the students in Pennsylvania had 
proficiencies below 225 while the top 10 percent of the students had proficiencies above 
314. 



GradeJ 1 The average proficiency of public-school students in Pennsylvania in 1992 was about the 

1990 vs 1992 same as the average proficiency in 1990 (271 in 1992 and 266 in 1990). In Pennsylvania, 

the score that signified the 10th percentile in 1992 (225) was about the same as the score 
that signified the 10th percentile in 1990 (222). Similarly, the score that signified the 90th 
percentile in 1992 (314) was about the same as the score that signified the 90th percentile 
in 1990 (309). 



LEVELS OF ACHIEVEMENT 



When Congress established the National Assessment Governing Board (NAGB) in 1988 to set policy for 
NAEP, it charged the board with “identifying appropriate achievement goals for each age and grade in each 
subject area to be tested under the National Assessment.” (Pub. L. 297-100 Section 3403 (a)(5)(B)(ii)). 



NAGB developed three achievement levels for each grade - Basic, Proficient, and Advanced. Performance 
at the Basic level denotes partial mastery of the knowledge and skills that are fundamental for proficient 
work at each grade level. The central level, called Proficient, represents solid academic performance at each 
grade level tested. Students reaching this level demonstrate competency over challenging subject matter and 
are well prepared for the next level of schooling. Achievement at the Advanced level signifies superior 
performance at the grade tested. 

Grade 4 More than half of the students in public schools in Pennsylvania (66 percent), versus 

1**2 59 percent in the nation, are at or above the Basic level. About one quarter of the 

students in Pennsylvania (23 percent), versus 18 percent in the nation, are at or above 
the Proficient level. Relatively few of the students in Pennsylvania (3 percent), versus 
2 percent in the nation, are at or above the Advanced level. 



Differences reported are statistically significant at the 95 percent confidence level. This means that with 95 percent confidence, 
there is a real difference in the average mathematics proficiency between the two populations of interest. “About the same” 
means that no statistically significant difference was found at the 95 percent confidence level. 
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